Fix user text message deduplication to keep best version #65

cboos · 2025-12-17T16:27:25Z

Summary

Fix user message deduplication, as in some cases it didn't happen; we preserve the version with most content items (instead of arbitrarily keeping the first one)
Add cache breaking change marker for IDE tag transformation introduced in v0.9.0
Fix flaky browser tests by using persistent CDN cache directory
Move ty config from standalone file into pyproject.toml

Background

User text messages with the same timestamp but different UUIDs can occur when resuming a Claude Code session after a git branch switch. Claude Code appears to re-log the user input multiple times, with each entry getting a different UUID but the same timestamp. The content gets split inconsistently across these entries - one might have just the IDE notification (<ide_opened_file>), another might have just the user's actual text, and one might have both combined.

Previously, deduplication would not happen in this case. Now it keeps the version with the most content items, ensuring both IDE notifications and actual user messages are preserved.

Test plan

Run uv run pytest test/test_version_deduplication.py -v for new deduplication tests
Run just test-browser to verify browser test stability
Verify cache invalidation works correctly after upgrade

Notes

The deduplication issue was discovered in session 741790a4-4fe2-4644-9a51-fb4482074060.jsonl from the coderabbit-review-helper project (cc @daaain).

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved message deduplication to keep the most complete user message.
- Cache compatibility updated so older caches (0.8.x) are invalidated when upgrading to 0.9.x.
Rendering
- Deduplication notices can include clickable links to the original message.
- Message element IDs standardized (removed pair suffixes), updating rendered anchors.
Tests
- Increased timeline load timeout and enabled persistent browser caching to reduce CDN flakiness.
Chores
- Removed legacy type-checker config and added Playwright cache to ignore patterns.
Public API
- Exposed a distinct UserTextContent type for user messages.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Caches from v0.8.0 and earlier need regeneration because v0.9.0 introduced _compact_ide_tags_for_preview() which transforms first_user_message to use emoji indicators instead of raw IDE tags. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Browser tests were failing inconsistently due to vis-timeline CDN load times varying from 2s to 30s+. The 10s timeout caused random failures, especially with parallel test execution where each xdist worker started with a cold cache. Changes: - Use persistent browser context with fixed cache directory instead of per-run temp directories, allowing CDN resources to stay cached - Increase vis-timeline load timeout from 10s to 30s as fallback for cold loads - Add .playwright_cache/ to .gitignore With these changes, browser tests pass consistently (28 passed in ~13s vs previously 2-6 random failures per run). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

User text messages with the same timestamp but different UUIDs (can occur during git branch switches) were being deduplicated but always kept the first occurrence, regardless of content quality. Now the deduplication keeps the version with the most content items, ensuring IDE notifications and actual user messages are both preserved when split across multiple entries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2025-12-17T16:27:37Z

Walkthrough

Adds Playwright cache ignore, enforces a cache-compatibility cutoff at 0.9.0, swaps TypeGuard helpers for optional-casting helpers and updates renderer usage, implements in-place user-text deduplication, adds persistent per-worker Playwright fixtures and longer timeline timeout, removes ty.toml, and updates tests and HTML id/dedup linking.

Changes

Cohort / File(s)	Summary
VCS & Config `/.gitignore`, `/(removed) ty.toml`	Added `.playwright_cache/` to `.gitignore`; removed `ty.toml` type-checker config.
Cache & Tests `claude_code_log/cache.py`, `test/test_cache.py`	Added breaking-change mapping `"0.8.0" → "0.9.0"` to cache compatibility logic; replaced/updated test to assert the new cutoff behavior.
Parser API `claude_code_log/parser.py`	Replaced `is_user_entry`/`is_assistant_entry` TypeGuard helpers with `as_user_entry`/`as_assistant_entry` that return optionals (casts or None).
Renderer updates `claude_code_log/renderer.py`, `claude_code_log/html/system_formatters.py`, `claude_code_log/html/templates/transcript.html`	Updated imports to use `as_*` helpers; added `_resolve_dedup_targets(messages)`; adjusted dedup notice shape and rendering to optionally link to `#msg-{target_message_id}`; standardized message div ids to `msg-<message_id>` (removed pair suffix).
Deduplication logic & tests `claude_code_log/converter.py`, `test/test_version_deduplication.py`, `test/test_sidechain_agents.py`	Reworked `deduplicate_messages` to deduplicate user-text messages in-place using a dedup_key→index map and prefer entries with more content; added tests validating user-text deduplication and updated expectations/assertions for dedup notice/linking.
Models & Test imports `claude_code_log/models.py`, `test/test_ide_tags.py`	Added fields (`target_uuid`, `target_message_id`, `original_text`) to `DedupNoticeContent`; exported `UserTextContent` for tests and updated assertions.
Playwright test infra & timing `test/conftest.py`, `test/test_timeline_browser.py`	Added persistent per-worker Playwright fixtures (`_browser_user_data_dir`, `_persistent_context`, `context`, `page`) to share HTTP cache across tests; increased timeline wait timeout from 10s to 30s.
Snapshots `test/__snapshots__/test_snapshot_html.ambr`	Snapshot updates reflect standardized message ids (removed `-first`/`-last` suffixes); no content or structure changes beyond id renaming.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:
- claude_code_log/converter.py — correctness of dedup_key→index map, in-place replacement behavior, ordering/stability edge cases.
- claude_code_log/parser.py ↔ claude_code_log/renderer.py — all call sites correctly handle None returns from as_* helpers.
- claude_code_log/renderer.py — _resolve_dedup_targets correctness and interactions with hierarchy assignment and template message IDs.
- Tests/playwright fixtures — lifecycle/isolation of persistent contexts across xdist workers and potential test flakiness changes.
- HTML id changes and dedup link anchors — ensure anchors (#msg-...) resolve and snapshot updates are consistent.

Possibly related PRs

Introduce visual structure for the conversation and some specialized tool rendering #37 — Overlaps renderer changes and dedup/target-resolution fields and logic.
Review and polish (0.8dev) #51 — Prior edits to deduplication logic and tests; directly related to converter.py changes.
Foldable messages #42 — Related renderer/template deduplication and message id/linking modifications.

Poem

🐰
I tuck a cache where playwright naps,
I swap my guards for gentler maps,
I mend duplicate lines and keep the best,
I share the crumbs and let timeouts rest,
A little hop — the log looks blessed.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix user text message deduplication to keep best version' directly and clearly describes the primary change: improving user message deduplication logic to retain the version with the most content.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev/fix-regressions

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e9c665 and e088d69.

📒 Files selected for processing (6)

claude_code_log/html/system_formatters.py (1 hunks)
claude_code_log/html/templates/transcript.html (1 hunks)
claude_code_log/models.py (1 hunks)
claude_code_log/renderer.py (8 hunks)
test/__snapshots__/test_snapshot_html.ambr (16 hunks)
test/test_sidechain_agents.py (2 hunks)

✅ Files skipped from review due to trivial changes (1)

test/snapshots/test_snapshot_html.ambr

🚧 Files skipped from review as they are similar to previous changes (1)

claude_code_log/models.py

🧰 Additional context used

📓 Path-based instructions (4)

test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

test/test_sidechain_agents.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

test/test_sidechain_agents.py
claude_code_log/html/system_formatters.py
claude_code_log/renderer.py

claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

claude_code_log/renderer.py

claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Files:

claude_code_log/renderer.py

🧠 Learnings (5)

📓 Common learnings

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to test/**/*browser*.py : Use Playwright for browser-based testing of interactive HTML features, particularly for testing the timeline visualization and filter controls

📚 Learning: 2025-11-30T23:24:07.840Z

Learnt from: cboos
Repo: daaain/claude-code-log PR: 54
File: claude_code_log/renderer.py:2912-2945
Timestamp: 2025-11-30T23:24:07.840Z
Learning: In claude_code_log/renderer.py, the agentId field is currently only set on Task tool_result messages, not on tool_use messages, because the agentId is generated after the tool_use is logged. The _reorder_sidechain_template_messages function relies on this to avoid duplicate sidechain insertion.

Applied to files:

test/test_sidechain_agents.py
claude_code_log/html/templates/transcript.html

📚 Learning: 2025-12-09T23:52:47.578Z

Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

test/test_sidechain_agents.py

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/templates/**/*.html : Use Jinja2 templates for HTML generation, including session navigation with table of contents, message rendering with different content types, and token display for individual messages and session totals

Applied to files:

claude_code_log/html/templates/transcript.html

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/renderer_timings.py : Implement performance timing instrumentation via the CLAUDE_CODE_LOG_DEBUG_TIMING environment variable to identify performance bottlenecks in rendering phases, with detailed timing for initialization, deduplication, session summary processing, main message loop, and template rendering

Applied to files:

claude_code_log/renderer.py

🧬 Code graph analysis (1)

claude_code_log/renderer.py (3)

claude_code_log/parser.py (2)

as_assistant_entry (487-491)

as_user_entry (480-484)

claude_code_log/models.py (1)

DedupNoticeContent (464-474)

claude_code_log/utils.py (2)

should_use_as_session_starter (110-129)

create_session_preview (136-156)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test (ubuntu-latest, 3.12)
GitHub Check: test (ubuntu-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.11)
GitHub Check: test (windows-latest, 3.11)
GitHub Check: test (ubuntu-latest, 3.13)
GitHub Check: test (windows-latest, 3.13)
GitHub Check: test (windows-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.14)
GitHub Check: test (windows-latest, 3.12)
GitHub Check: test (windows-latest, 3.14)

🔇 Additional comments (11)

claude_code_log/html/templates/transcript.html (1)

106-106: LGTM! Simplified message IDs enable consistent anchor linking.

Removing the pair_role suffix from DOM IDs while keeping it in CSS classes is the right approach. This ensures dedup notice links (href="#msg-{target_message_id}") work correctly regardless of whether the target message is paired.

claude_code_log/html/system_formatters.py (1)

105-113: LGTM! Clean implementation of optional anchor linking for dedup notices.

The conditional anchor wrapping is well-implemented. The target_message_id is generated internally by _build_message_hierarchy (format: d-N or session-{id}), so not escaping it is safe.

test/test_sidechain_agents.py (2)

75-79: LGTM! Test assertions correctly updated to match new dedup notice format.

The assertions now verify:

The simplified notice text ("Task summary", "see result above")

The anchor link presence (href="#msg-")

This aligns with the DedupNoticeContent creation in renderer.py line 1650.

115-115: LGTM!

The negative assertion correctly verifies that "Task summary" does not appear when deduplication should not occur.

claude_code_log/renderer.py (7)

42-50: LGTM! Clean import refactoring.

Switching from is_user_entry/is_assistant_entry to as_user_entry/as_assistant_entry helpers enables more idiomatic Python with walrus operators while providing type-safe access to entry data.

502-504: LGTM! Correct placement in the processing pipeline.

_resolve_dedup_targets is correctly called after _build_message_hierarchy (which assigns message_id values) and before _mark_messages_with_children. The timing wrapper follows the existing pattern.

1649-1656: LGTM! Well-structured dedup notice with target tracking.

The DedupNoticeContent now captures:

target_uuid: Links to the Task result for later resolution

original_text: Preserves the duplicated content for debugging

The raw_text_content = None assignment correctly clears the dedup flag after processing.

1671-1686: LGTM! Efficient two-pass resolution of dedup targets.

The implementation correctly:

Builds a uuid -> message_id mapping in a single pass

Resolves target_uuid to target_message_id for dedup notices

Uses safe .get() lookup that returns None if the target UUID isn't found

The in-place mutation of msg.content.target_message_id works because DedupNoticeContent (Pydantic model) is mutable.

1823-1845: LGTM! Cleaner session info collection using helper functions.

Using as_user_entry(message) as a truthy check is cleaner than the previous approach. The text_content variable (extracted at line 1812) is correctly reused for both should_use_as_session_starter and create_session_preview calls.

1856-1869: LGTM! Idiomatic walrus operator pattern for token usage extraction.

The if assistant_entry := as_assistant_entry(message): pattern cleanly:

Checks if the message is an assistant entry

Binds the typed entry for immediate use

Provides direct access to message, requestId, and uuid fields

2037-2054: LGTM!

Consistent application of the same as_assistant_entry pattern in the rendering pass for token usage display.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

pyproject.toml (1)

82-86: Configuration consolidation looks good; use per-file overrides to scope suppression.

Moving the ty configuration into pyproject.toml is a good practice for consolidating project settings. The comment clearly explains the TypeGuard limitation.

However, this global suppression applies to all files and could mask legitimate "possibly-missing-attribute" issues elsewhere in the codebase. ty supports per-file overrides, which would allow more precise scoping. Instead of a global rule, use a [[tool.ty.overrides]] section with glob patterns to apply different rule configurations to specific files or directories containing is_user_entry() and is_assistant_entry(). Alternatively, use inline suppression comments like # ty: ignore[possibly-missing-attribute] at the specific locations where the false positive occurs.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1807c11 and bdc1c49.

📒 Files selected for processing (9)

.gitignore (1 hunks)
claude_code_log/cache.py (1 hunks)
claude_code_log/converter.py (3 hunks)
pyproject.toml (1 hunks)
test/conftest.py (1 hunks)
test/test_cache.py (1 hunks)
test/test_timeline_browser.py (1 hunks)
test/test_version_deduplication.py (1 hunks)
ty.toml (0 hunks)

💤 Files with no reviewable changes (1)

ty.toml

🧰 Additional context used

📓 Path-based instructions (5)

claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

claude_code_log/cache.py
claude_code_log/converter.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

claude_code_log/cache.py
test/test_timeline_browser.py
test/test_cache.py
test/conftest.py
test/test_version_deduplication.py
claude_code_log/converter.py

test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

test/test_timeline_browser.py
test/test_cache.py
test/conftest.py
test/test_version_deduplication.py

test/**/*browser*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Playwright for browser-based testing of interactive HTML features, particularly for testing the timeline visualization and filter controls

Files:

test/test_timeline_browser.py

pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Use uv for dependency management with commands like 'uv add' to add dependencies and 'uv remove' to remove dependencies

Files:

pyproject.toml

🧠 Learnings (6)

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to test/**/*browser*.py : Use Playwright for browser-based testing of interactive HTML features, particularly for testing the timeline visualization and filter controls

Applied to files:

.gitignore
test/test_timeline_browser.py
test/conftest.py

📚 Learning: 2025-12-09T23:52:47.578Z

Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

test/test_timeline_browser.py
test/test_cache.py
test/conftest.py
test/test_version_deduplication.py

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to **/*.py : Use ruff for code formatting and linting, with ruff check --fix for automatic fixes

Applied to files:

pyproject.toml

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to **/*.py : Use pyright and mypy for type checking in Python code

Applied to files:

pyproject.toml

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to **/*.py : Target Python 3.10+ with support for modern Python features and type hints

Applied to files:

pyproject.toml

📚 Learning: 2025-11-30T17:16:32.494Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.494Z
Learning: Separate TUI tests using run_test() from browser tests using Playwright and unit tests to avoid 'RuntimeError: This event loop is already running' conflicts due to incompatible async frameworks

Applied to files:

test/conftest.py

🧬 Code graph analysis (4)

test/test_timeline_browser.py (1)

test/conftest.py (1)

page (89-93)

test/test_cache.py (1)

claude_code_log/cache.py (1)

_is_cache_version_compatible (411-451)

test/test_version_deduplication.py (3)

claude_code_log/models.py (3)

TextContent (682-684)

UserTranscriptEntry (812-816)

UserMessage (749-752)

claude_code_log/converter.py (1)

deduplicate_messages (310-394)

claude_code_log/html/renderer.py (1)

generate_html (305-314)

claude_code_log/converter.py (1)

claude_code_log/models.py (2)

SummaryTranscriptEntry (825-830)

UserTranscriptEntry (812-816)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test (windows-latest, 3.11)
GitHub Check: test (windows-latest, 3.12)
GitHub Check: test (windows-latest, 3.13)
GitHub Check: test (ubuntu-latest, 3.13)
GitHub Check: test (ubuntu-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.14)
GitHub Check: test (windows-latest, 3.14)
GitHub Check: test (ubuntu-latest, 3.12)
GitHub Check: test (ubuntu-latest, 3.11)
GitHub Check: test (windows-latest, 3.10)

🔇 Additional comments (15)

.gitignore (1)

53-53: LGTM!

The addition of .playwright_cache/ is appropriate for ignoring the persistent Playwright browser cache directory introduced in test/conftest.py. The placement in the test artifacts section is logical.

claude_code_log/converter.py (4)

321-322: LGTM!

The docstring addition clearly explains the new deduplication behavior for user text messages with identical timestamps but different UUIDs, addressing the branch switch artifact scenario mentioned in the PR objectives.

332-332: LGTM!

The change from a set to a dict mapping dedup_key to list index is necessary to enable in-place replacement of duplicate messages. This allows the deduplicator to replace earlier entries with later, more complete versions.

359-372: LGTM!

The logic correctly identifies user text messages by checking for the absence of ToolResultContent. The use of for-else is appropriate here: the else block executes only when no ToolResultContent is found (loop completes without break), correctly setting is_user_text = True and leaving content_key empty for deduplication by timestamp alone.

380-392: LGTM!

The in-place replacement logic correctly implements the "keep the version with most content" strategy. The comparison of content item counts ensures that the most complete message is retained regardless of input order. The defensive type check is appropriate, and the logic handles all ordering scenarios correctly.

test/test_timeline_browser.py (1)

54-55: LGTM!

The timeout increase from 10s to 30s is appropriate for handling CDN cold loads on first access per xdist worker. The comment clearly explains the rationale, and this aligns with the persistent browser cache implementation that reduces subsequent load times.

test/test_cache.py (1)

572-583: LGTM!

The test correctly validates the 0.8.0 → 0.9.0 breaking change rule. The assertions properly verify that:

Caches from 0.9.0+ are compatible (same or newer than breaking change threshold)

Caches from 0.8.0 and earlier are invalidated (at or below breaking version when current >= 0.9.0)

This aligns with the breaking change rule added in claude_code_log/cache.py.

claude_code_log/cache.py (1)

424-426: LGTM!

The breaking change entry is well-documented with a clear explanation of why cache invalidation is needed. The mapping correctly invalidates caches from 0.8.0 and earlier when upgrading to 0.9.0+, which aligns with the PR's objective of adding a cache-breaking change marker for the IDE tag transformation.

test/test_version_deduplication.py (3)

274-283: LGTM!

The test is well-documented with a clear explanation of the git branch switch scenario. The import of TextContent is necessary for constructing the test messages, and using a fixed timestamp is appropriate for testing deduplication behavior.

285-352: LGTM!

The three test messages correctly represent the branch switch artifact scenario:

msg1 with 2 content items (IDE notification + actual text) represents the complete message

msg2 and msg3 with 1 content item each represent partial versions

All share the same timestamp but have different UUIDs, matching the real-world scenario

354-370: LGTM!

The test logic comprehensively validates the deduplication behavior by testing three different orderings, ensuring the in-place replacement logic works regardless of message order. The assertions correctly verify:

Deduplication occurred (content appears exactly once)

The most complete message (msg1) was selected (IDE notification present)

test/conftest.py (4)

42-56: LGTM!

The fixture correctly implements per-worker persistent cache directories:

Session scope is appropriate for sharing across tests

worker_id parameter enables pytest-xdist parallel execution without lock conflicts

Using a fixed directory (not temporary) enables caching across test runs

The path matches the .gitignore entry added in this PR

59-76: LGTM!

The persistent context fixture correctly implements HTTP caching for CDN resources:

Uses launch_persistent_context which is the appropriate Playwright API

Session scope shares the context across all tests in a worker

Configuration properly merges launch args with viewport and HTTPS settings

Context is properly cleaned up using the yield pattern

79-85: LGTM!

The context fixture override correctly redirects pytest-playwright's default context to use the persistent context. This ensures all browser tests benefit from the shared HTTP cache while maintaining the standard pytest-playwright API.

88-93: LGTM!

The page fixture correctly provides per-test page isolation while using the shared persistent context. Each test gets a fresh page (for proper isolation) but benefits from the shared HTTP cache. The cleanup pattern is correct.

Add explicit isinstance check so pyright can track that message is a UserTranscriptEntry when accessing message.message.content. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

claude_code_log/converter.py (1)

380-392: Implementation looks correct.

The replacement logic properly compares content item counts and performs in-place replacement only when beneficial. The isinstance(existing, UserTranscriptEntry) check at line 385 is a good defensive type guard.

One subtle invariant to note: when is_user_text is True for the current message, the existing entry at the same dedup_key must also be a UserTranscriptEntry (since dedup_key includes message_type). The current code handles this correctly, but if this invariant ever breaks, the replacement would silently fail. Consider adding a brief comment or assertion if you want to catch unexpected invariant violations in the future.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdc1c49 and 5ad865a.

📒 Files selected for processing (1)

claude_code_log/converter.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

claude_code_log/converter.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

claude_code_log/converter.py

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to test/**/*browser*.py : Use Playwright for browser-based testing of interactive HTML features, particularly for testing the timeline visualization and filter controls

🧬 Code graph analysis (1)

claude_code_log/converter.py (1)

claude_code_log/models.py (2)

SummaryTranscriptEntry (825-830)

UserTranscriptEntry (812-816)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test (windows-latest, 3.14)
GitHub Check: test (ubuntu-latest, 3.14)
GitHub Check: test (windows-latest, 3.13)
GitHub Check: test (windows-latest, 3.12)
GitHub Check: test (windows-latest, 3.11)
GitHub Check: test (ubuntu-latest, 3.12)
GitHub Check: test (windows-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.11)
GitHub Check: test (ubuntu-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.13)

🔇 Additional comments (3)

claude_code_log/converter.py (3)

321-329: LGTM!

The updated docstrings clearly document the new deduplication behavior: user text messages with matching timestamps are deduplicated while preserving the entry with the most content items. The in-place replacement semantics are properly explained in the return description.

331-332: LGTM!

The change from a set to a dict mapping dedup keys to indices is the correct approach for enabling in-place replacement of existing entries.

356-372: LGTM!

The for-else pattern cleanly identifies user text messages (those without ToolResultContent). Using an empty content_key for these messages ensures they deduplicate by timestamp alone, while the is_user_text flag gates the content-count comparison logic.

cboos · 2025-12-17T21:30:54Z

Not sure why we had one failure:

FAILED test/test_timeline_browser.py::TestTimelineBrowser::test_timezone_conversion_functionality - AssertionError: Should show timezone abbreviation in parentheses
assert ')' in '2025-07-03 15:50:07'

Only for Python 3.10 on Windows, first attempt. Second worked. I can't reproduce it locally.

Replace TypeGuard[T] with T | None pattern in is_user_entry() and is_assistant_entry(). This enables walrus operator usage and works with both pyright and ty type checkers, allowing removal of the ty rules workaround. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

claude_code_log/parser.py (1)

480-491: Consider renaming functions to reflect non-boolean return.

The refactoring from TypeGuard to Optional[T] is correct and improves call-site efficiency (as seen in renderer.py). However, the is_ prefix conventionally suggests boolean returns in Python. Names like get_user_entry or as_user_entry would better signal that these return typed entries or None.

The implementation itself is sound—the type checks are safe given Pydantic's Literal types, and the walrus-operator usage in renderer.py reads naturally.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ad865a and 4b5b59a.

📒 Files selected for processing (2)

claude_code_log/parser.py (2 hunks)
claude_code_log/renderer.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

claude_code_log/renderer.py
claude_code_log/parser.py

claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Files:

claude_code_log/renderer.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

claude_code_log/renderer.py
claude_code_log/parser.py

🧠 Learnings (1)

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/models.py : Use Pydantic models for parsing and validating transcript JSON data, including TranscriptEntry (union of UserTranscriptEntry, AssistantTranscriptEntry, SummaryTranscriptEntry), UsageInfo, and ContentItem

Applied to files:

claude_code_log/parser.py

🧬 Code graph analysis (2)

claude_code_log/renderer.py (2)

claude_code_log/parser.py (3)

is_user_entry (480-484)

extract_text_content (64-80)

is_assistant_entry (487-491)

claude_code_log/utils.py (1)

should_use_as_session_starter (110-129)

claude_code_log/parser.py (1)

claude_code_log/models.py (3)

UserTranscriptEntry (812-816)

MessageType (17-52)

AssistantTranscriptEntry (819-822)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test (ubuntu-latest, 3.12)
GitHub Check: test (windows-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.13)
GitHub Check: test (windows-latest, 3.13)
GitHub Check: test (ubuntu-latest, 3.11)
GitHub Check: test (windows-latest, 3.11)
GitHub Check: test (windows-latest, 3.14)
GitHub Check: test (windows-latest, 3.12)
GitHub Check: test (ubuntu-latest, 3.14)

🔇 Additional comments (4)

claude_code_log/renderer.py (4)

1800-1804: LGTM!

The walrus-operator pattern efficiently combines the entry retrieval and type check, avoiding redundant function calls. Field access on the returned entry is safe and correct.

1821-1828: LGTM!

Consistent usage pattern with proper conditional checks. The elif structure correctly handles the case where a session already exists but needs its first user message populated.

1839-1842: LGTM!

Clean extraction of multiple fields from the returned entry. This pattern is particularly beneficial here where three attributes need to be accessed from the same entry.

2020-2022: LGTM!

Consistent application of the walrus-operator pattern for accessing assistant entry fields. The refactoring maintains correctness while reducing redundant function calls throughout the file.

The is_ prefix conventionally implies boolean returns in Python. Rename to as_user_entry() and as_assistant_entry() to better signal that these return typed entries or None. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

UserTextContent is now instantiated in parser.py and renderer.py, so the TODO claiming it wasn't used is stale. Also use isinstance() for type narrowing in test_ide_tags.py to satisfy ty. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

test/test_ide_tags.py (1)
314-314: Consider adding test coverage for plain text without IDE tags.

The assertion correctly validates that parse_user_message_content returns UserTextContent when IDE notifications are present. However, there's no test verifying the behavior when parsing plain user text without any IDE tags.

Consider adding a test case like:
def test_parse_user_message_without_ide_tags(self):
    """Test parsing plain user message without IDE tags."""
    plain_text = "This is a simple user message."
    content_list = [TextContent(type="text", text=plain_text)]
    
    content_model = parse_user_message_content(content_list)
    
    # Verify the type and that ide_notifications is None
    assert isinstance(content_model, UserTextContent)
    assert content_model.ide_notifications is None
    assert content_model.text == plain_text
This would ensure both code paths are tested and document the expected behavior for plain text inputs.
claude_code_log/renderer.py (1)
1800-1804: Minor redundancy: text_content is already extracted from the same source.

For user entries, message_content (line 1787) equals user_entry.message.content, so re-extracting at line 1803 yields the same value as text_content (line 1788). Consider reusing text_content directly:
 if (user_entry := as_user_entry(message)) and should_use_as_session_starter(
     text_content
 ):
-    content = extract_text_content(user_entry.message.content)
-    first_user_message = create_session_preview(content)
+    first_user_message = create_session_preview(text_content)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b5b59a and e99f80f.

📒 Files selected for processing (4)

claude_code_log/models.py (0 hunks)
claude_code_log/parser.py (2 hunks)
claude_code_log/renderer.py (5 hunks)
test/test_ide_tags.py (2 hunks)

💤 Files with no reviewable changes (1)

claude_code_log/models.py

🚧 Files skipped from review as they are similar to previous changes (1)

claude_code_log/parser.py

🧰 Additional context used

📓 Path-based instructions (4)

test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

test/test_ide_tags.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

test/test_ide_tags.py
claude_code_log/renderer.py

claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

claude_code_log/renderer.py

claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Files:

claude_code_log/renderer.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to test/**/*browser*.py : Use Playwright for browser-based testing of interactive HTML features, particularly for testing the timeline visualization and filter controls

📚 Learning: 2025-11-30T17:16:32.495Z

Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/models.py : Use Pydantic models for parsing and validating transcript JSON data, including TranscriptEntry (union of UserTranscriptEntry, AssistantTranscriptEntry, SummaryTranscriptEntry), UsageInfo, and ContentItem

Applied to files:

test/test_ide_tags.py

📚 Learning: 2025-12-09T23:52:47.578Z

Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

test/test_ide_tags.py

🧬 Code graph analysis (2)

test/test_ide_tags.py (1)

claude_code_log/models.py (2)

TextContent (679-681)

UserTextContent (265-275)

claude_code_log/renderer.py (2)

claude_code_log/parser.py (3)

as_assistant_entry (487-491)

as_user_entry (480-484)

extract_text_content (64-80)

claude_code_log/utils.py (1)

should_use_as_session_starter (110-129)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test (windows-latest, 3.12)
GitHub Check: test (windows-latest, 3.14)
GitHub Check: test (ubuntu-latest, 3.11)
GitHub Check: test (windows-latest, 3.10)
GitHub Check: test (windows-latest, 3.11)
GitHub Check: test (windows-latest, 3.13)
GitHub Check: test (ubuntu-latest, 3.13)
GitHub Check: test (ubuntu-latest, 3.14)
GitHub Check: test (ubuntu-latest, 3.10)
GitHub Check: test (ubuntu-latest, 3.12)

🔇 Additional comments (5)

test/test_ide_tags.py (1)

20-21: LGTM!

The addition of UserTextContent to the imports is necessary to support the updated type assertion in the test. Both imports serve distinct purposes: TextContent for constructing test input and UserTextContent for validating the parser output type.

claude_code_log/renderer.py (4)

43-44: LGTM!

The new imports align with the updated entry-based API from the parser module, enabling type-safe access to entry properties.

1821-1828: LGTM!

The walrus operator pattern correctly captures the typed entry, and the extracted content is consistently used for both the starter check and preview creation.

1839-1842: LGTM!

Clean extraction of assistant entry properties. The subsequent usage check (line 1845) properly handles the case where usage may be None.

2020-2022: LGTM!

Consistent use of the as_assistant_entry pattern, matching the approach in _collect_session_info.

The text_content variable is already extracted from message_content, which equals user_entry.message.content for user entries. No need to extract it again. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

DedupNoticeContent now stores target_uuid and resolves it to target_message_id after hierarchy building. The formatter creates a clickable anchor link to the Task result message. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cboos and others added 4 commits December 17, 2025 17:10

Integrate ty config in pyproject.toml

2d3fc57

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

cboos and others added 2 commits December 18, 2025 00:02

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

cboos and others added 2 commits December 18, 2025 00:39

cboos merged commit ba68e0f into main Dec 18, 2025
11 checks passed

coderabbitai bot mentioned this pull request Jan 4, 2026

SQLite cache #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix user text message deduplication to keep best version #65

Fix user text message deduplication to keep best version #65

Uh oh!

cboos commented Dec 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

cboos commented Dec 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix user text message deduplication to keep best version #65

Fix user text message deduplication to keep best version #65

Uh oh!

Conversation

cboos commented Dec 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Test plan

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cboos commented Dec 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cboos commented Dec 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 17, 2025 •

edited

Loading